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ABSTRACT 


Kohonen's "feature maps" approach to clustering is often likened to the k or c-means clustering algorithms. In 
this note we identify some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or 
ISODATA algorithms and Kohonen's "self-organizing" (KSO) approach. We conclude that some differences 
are significant, but at the same time there may be some important unknown relationship(s) between the two 
methodologies. We propose several avenues of research which, if successfully resolved, would strengthen 
both the HCM/FCM and Kohonen clustering models. We do not. In this note, address aspects of the KSO 
method related to associative memory and to the feature map display technique. 

1. INTRODUCTION 

Treatments of many classical approaches to clustering appear in Kohonen [1], Bezdek [2], and Duda and Hart 
[3], Kohonen's work has become particularly timely in recent years because of the widespread resurgence of 
interest in Artificial Neural Network (ANN) structures. ANNs and pattern recognition are discussed by Pao [4] 
and Lippman [5]. Our interest lies with the KSO algorithm as it relates to the solution of clustering and 
classification problems and the HCM/FCM models. 

2. CLUSTERING ALGORITHMS AND CLASSIFIER DESIGN 

Let (c) be an integer, i< c < n and let X = {x^ x 2 x n ) denote a set of (n) feature vectors in Jl s . X is 

numerical object data', the j-th object (some physical entity such as a medical patient, seismic record etc.) has 
vector Xj as it's numerical representation; Xj k is the k-th characteristic (or feature) associated with object j. Given 

X, we say that (c) fuzzy subsets {Uj:X* [0,1]} are a fuzzy c-partition of X in case the (cn) values {u jk * Uj(x k ), 1 s 
k £ n, 1 <, i <, c) satisfy three conditions: 

0£u jk £ 1 for alii, k ( 1a ) 
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(1b) 


ZUj k = 1 for all k 

0<Lu ik <nforalli . (1c) 

Each set of (cn) values satisfying conditions (1) can be arrayed as a (cxn) matrix U * [u jk J. The set of all such 
matrices are the non-degenerate fuzzy c-partitions of X: 

- {U in Jl 00 1 u jk satisfies (1 ) for all i and k). (2) 

And in case all the Uj k 's are either 0 or 1 , we have the subset of hard (or crisp) c-partitions of X: 

M cn = (U in I u^ =■ 0 or 1 for all i and k). (3) 

The reason these matrices are called partitions follows from the interpretation of u jk as the membership of x k in 
the i-th partitioning subset (cluster) of X. is more realistic as a physical model than M cn , for it is common 

experience that the boundaries between many classes of real objects are in fact very badly delineated (i.e., 
really fuzzy). The important point is that all clustering algorithms generate solutions to the clustering problem 
for X which are matrices in Mj cr) . The clustering problem for X, is, quite simply, the identification of an "optimal" 

partition U of X in M (cn ; that is, one that groups together object data vectors (and hence the objects they 
represent) which share some well defined (mathematical) similarity, it is our hope and implicit belief, of course, 
that an optimal mathematical grouping is in some sense an accurate portrayal of natural groupings in the 
physical process from whence the object data are derived. The number of clusters (c) must be known, or 
becomes an integral part of the problem. 

3. THE ISODATA AND KSO ALGORITHMS 

The most well known objective function for clustering is the least total squared error function: 

J 1 (U,v;X) = £Eu |k (||x k -v j || I ) 2 , (4) 

where v = (v^ , v 2 v c ) is a vector of (unknown) cluster centers (weights or prototypes), Vj e for 1 < i < c, 

U e M cn is an unknown hard c-partition of X, and ||.||j is the Euclidean norm on Jl s . Optimal partitions U* of X 
are taken from pairs (U*. v*) that are "local minimizers" of J 1 . It is important to recognize the geometric inpact 
that the use of a norm function in as the criterion of (dis)similarity has on "good clusters" (here ||*||j , but 

more generally, any norm on Jl s induced by a positive definite weight matrix A, as described below). Figure 1 
illustrates this graphically; partitions that optimize J-| will, generally speaking, contain clusters that conform to 
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the topology that is induced on Jl s by the eigenstructure of the norm-inducing matrix A. When A - 1, good 
clusters will be hyperspherical, as the one in the left portion of Figure 1 ; otherwise, they will be hyperelliptical, 
as the one on the right side of Figure 1 . 

Figure 1. Geometry of Cluster Formation In Norm-Driven Clustering Algorithms 



As is evident in Figure 1 , clusters that optimize J-j are formed on the basis of two properties: location and 

shape. Location information is contained in the lengths of the data vectors and "cluster centers" or prototypes 
{Vj} from the origin, whilst shape information is embedded in the topology induced by the norm in use. 

Roughly speaking, these correspond to the mean and variance of probability distributions, so (4) is in some 
sense analogous to regarding the data as being drawn from a mixture of probability density functions (indeed, 
there are special cases when (4) yields identical results to the maximum likelihood estimators of the parameters 

of a mixture of normal distributions). Although the norm shown In (4) is the Euclidean norm, generalizations of 
J 1 have used all five of the usual norms encountered in numerical analysis and pattern recognition - viz, the 

Euclidean, Diagonal and Mahalonobis inner product A-norms; and the p - 1 and p - « (city block and sup) 
Minkowski norms. The defining equations and unit ball shapes for these two families of norms are shown in 
Figure 2. 

As an explicit moans for finding optimal partitions of object data, J-j was popularized as part of the ISODATA 
("Iterative Self-Organizing Data Analysis") algorithm (c-Means + Heuristics) by Ball and Hall [6] in 1967. It is 
interesting to note that Kohonen apparently first used the term "self-organizing" to describe his approach 
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about 15 years later [1], Apparently, the feature of both algorithms that suggests this phrase is their ability to 
iteratively adjust the weight vectors or prototypes that subsequently represent the data in an orderly and 
improving manner as the algorithms proceed with iteration. We contend that this use of the term "self- 
organizing” in the current context of neural network research is somewhat misleading (in both cases). Indeed, 
if the aspect of FCM/HCM and KSO that entitles us to call them self-organizing is their ability to adjust their 
parameters during "training”, then every iterative method that produces approximations from data is self- 
organizing (e.g., Newton's method!). On the other hand, if this term serves to indicate that the algorithms in 
question can find meaningful labels for objects, without external interference (labelled) training examples), 
then all clustering algorithms are "self-organizing”. Since the terminology in both cases is well established, the 
only expectation this writer has about the efficacy of these remarks is that they caution readers take the 
semantics associated with much of the current Neural Network literature with a large grain of salt. 


Figure 2. Geometry of Level Sets for Inner product A-norms and Minkowski p-norms 


Unit Ball Shapes in the A - norms 
L a ■ {x : <x,x> A = x T Ax = (||x|| A ) 2 s 1} 
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Unit Ball Shapes in the p - norms 
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Dunn [7] first generalized J 1 by allowing U to be fuzzy (m=2 below) and the norm to be an arbitrary inner 
product A-norm. Bezdek [8] generalized Dunn's functional to the fuzzy ISODATA family written as: 

.yu,*; X) - m^ n \||x k -V||| A ) 2 , (5) 

where mE|1,«)isa weighting exponent on each fuzzy membership: U e M fcn is a fuzzy c-partition of X; v = 

( Vl , v 2 v c ) are cluster centers in JL S ; A = is any positive definite (s x s) matrix; and (||x k -Vj || A ) 2 - (x k -Vj) T A 

(Xj^-Vj) is the OG (Sstance (in the A norm) from x k to Vj . 

In 1979, Gustafson and Kessel [8] derived necessary conditions to minimize an extension of (5) with (c) 
different norm inducing matrices. In 1981 Bezdek et. al. [9] generalized (5) by allowing the prototypes to be 
(convex combinations of) linear manifolds of arbitrary and different dimensions. In 1985 Pedrycz[10] 
introduced a way to use partially labeled data with (5) that amounts to a mixed supervised-unsupervised 
clustering scheme. In 1989 Dave [11] introduced a generalization of (5) that uses hyperspherical prototypes 
for v. In 1990 Bobrowski and Bezdek [12] used the city block and sup norms with (5), thus extending the c- 
Means algorithms to the most important Minkowski norms (p = 1 and p = °°). 

Necessary conditions that define iterative algorithms for (approximately) minimizing J m and its generalizations 

are known. Our interest lies with the cases represented by (4) and (5). The conditions that are necessary for 
minima of J 1 and J m follow : 

Hard c-Means fHCMl Theorem T21. (U, v) may minimize IE u jk (||x k - Vj|| A ) 2 only if 
u jk - 1; (l|x k -vj|| A ^ 2 -rri^{(||x k -v i || A ) 2 ); and =0; otherwise 
Vj - su ik x k /lu jk 


(6a) 

(6b) 


Note that HCM produces hard clusters U e M cn . The HCM conditions are necessary for "minima ,, of (4) (i.e., 
with A=l, the Euclidean norm on Jl s ), and, as we shall note, are also used to derive hard clusters in the KSO 
algorithm. The well known generalization of the HCM conditions is contained in the: 

Fuzzv c-Means (FCM1 Theorem T2L (U.v) may minimize IZu ik m (||x k - Vj|| A ) 2 for m > 1 only if : 

Uj k - ( 2 ( HXk- vjll A / ||Xk- Vjll A ) 2/(01 -V (7a) 
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(7b) 


Vi - I^VKUik)" 1 


The FCM conditions are necessary for minima of (5). There is an alternative equation for (7a) if one or more of 

the denominators in (7a) is zero. These equations converge to the HCM equations as m-»1 from above, and 
for (m > 1), the U in FCM is truly fuzzy, i.e., U e (M fcn - M cn ). The FCM algorithms are simple Picard iteration 

through the paired variables U and v. Because we want to compare this method to the KSO algorithm, we give 
a brief description of the FCM/HCM algorithms. 

(Parallel) c-Means (FCM/HCM) Algorithms 


<FCM/HCM 1> : Given unlabeled data set X - (x., , x 2 x n } . Fix : 1 < c < n; 1 ^ m < ~ (m-1 for HCM); 

positive definite weight matrix A to induce an inner product norm on Jl s ; and t, a small positive constant. 

<FCM/HCM 2>: Guess v Q = ( v 1 Q , v 2 Q v c Q ) e SL 05 (or, initialize U Q e M fcn ). 

<FCM/HCM 3>: For j - 1 to J: 

<3a> : Calculate Uj with (Vj ^ } ; 

<3b>: Update (Vj j.,} to {v f j) with ; 

<3o: K max . { ||Vj ^ to v ( j || } 5 e, IbfiD stop and put (U*.v*) = (Uj.Vj); EISA : Next j 

This procedure is known to converge q-linearly from any initialization to a local minimum or saddle point (U*,v*) 
of J m . Note again that the update rule for the weights {Vj} at step <3b> is a necessary condition for minimizing 

J m . Moreover, all (c) weight vectors are updated using all (n) data points simultaneously at each pass; i.e., the 
weights (Vj) are not sequentially updated as each x^ is processed. This is why we call the above descnption a 

"parallel* version of c-means, as opposed to the well known sequential version. 

There is a sequential version of hard c-means (SHCM) that can be used to minimize J 1 , and readers should be 

aware that it may produce quite different results than HCM on the same data set. One iteration of the SHCM 
algorithm is as follows: beginning with some hard U, the centers (Vj) are calculated with (6b). Once the 

prototypes are known, one returns to update U. Beginning with x^, each point is examined, and moved from, 
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say, cluster i to cluster j, so as to maximize the decrease in J 1 (if possible) . Then the two affected centers {Vj , 
v } and rows i and j of U are updated using equations (6) . One complete pass of SHCM consists of testing 
each of the n data points in X, and effecting a transfer at each point where a decrease In J 1 can be realized. 

SHCM terminates when a complete pass can be made without transfers. We mention this version of HCM 
because it is SHCM that most closely resembles the KSO algorithm. Figure 3 is a rough depiction of how the 
HCM method might begin; Figure 4 indicates a desirable situation at termination. In Figure 3 the initial hard 
clusters subdivide the data badly, and the overall mean squared error (the sum of squares of the solid line 
distances between data points and prototypes) is large; at termination, the prototypes lie "centered" in their 
clusters, the overall sum of squared errors is low, and the hard 2-partition subdivides the data "correctly" ( this 
is what happens if we are lucky !). 


Figure 3. An Initial 2-Partltion and Prototypes for HCM 
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Figure 4. A (Benevolent) Final Configuration of 2-Partition and Prototypes for HCM 



Kohonen's method differs from the c-means approach in several important ways. First, it is not a norm-driven 
scheme. Instead, the KSO method uses the geometric notion of orientation matching, depicted in Figure 5, 
as the basic measure of similarity between data points and cluster centers. Second, there is no partition U 
involved in the KSO algorithm. Instead, an initial set of cluster centers are iteratively updated without reference 
to partitions of the unlabeled data. The underlying geometry of the criterion of similarity is shown in Figure 5. 


The measure of similarity, as shown in Figure 5, is the angle betwe en a da t a point x and pro totype v (in the 
neural network community, the vectors {Vj} are often called "weight" vectors; each one being attached or 

identified with a "node" in the network). Information that the data set may contain about cluster shapes in 
feature space is lost (i.e., not used by cos(d)); and if the data are normalized at each step to be vectors of 
length 1, as they usually are in the KSO approach, location information is lost as well. Consequently, the 
geometry favored by the KSO criterion of similarity is data substructures that lie in angular cones emanating 
from the origin. We emphasize that in real data, either type of criterion - the c-means type norm driven 
measure, or the KSO angular measure - may or may not be appropriate for matching the data. As with all 
clustering problems, the question is not - which Is better? the question is, which is better for this data set? In 
order to effect comparison with the c-means model, a brief description of Kohonen’s algorithm follows. 
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Figure 5. Geometry of Cluster Formation In Orientation Matching Clustering Algorithms 



COS(e) = < X,V > s 1 - llx-vll' 


Kohonen's fKSOl Clustering Algorithm 

<KS01> : Given unlabeled , "ordered" data set X = {x^ x 2 , .... x n } . Hx : 1 < c*; Choose update scale factors 
{otj} so that { ocj } -* 0; Eotj - «, !(*. ) 2 < ~ ; Choose update neighborhood "radii" { Pj } e {0,1 ,2 c*}: 

<KS02> Guess (unit vectors) v Q = ( v i o • v 2 0 v c*,0 ) € S 

<KS03> For j * 1 to J: : For k = 1 to n: 

<3a> : Rnd i*(k) st fllx^ v r(k) ||j) 2 = min{ flfy- v^ll,) 2 
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<3fc»: For indices N*(k) = i*(k) , i*(k) ± 1 , i*(k) ± Pj Update v tj1 : 


't.j - Vi + < x k- v t> v t,j-l + «] v t> Up otherwise ’ v t,j = v t,j-1 


( 8 ) 


Next k; Next j 

We have used c* instead of c in this procedure to emphasize the fact that Kohonen's method often uses 
"multiple" prototypes, in the sense that even though (unbeknownst to us !) X contains only c clusters, it may 
be advantageous to look for c* > c cluster centers; this is a further difference between the c-means and KSO 

strategies. This is one form of Kohonen's approach; other update rules have been used. The geometry of the 
update rule for the weight vectors in (8) is depicted in Figure 6. Thus, if we are at point x k , as shown in Figure 

6, <3a> of the KSO algorithm simply finds the current prototype (v 0 , d ) closest to x k in angle (minimizing the 
angle is equivalent to the formula in <3a>). If the current center is called v Q | ( j * v j*(k) as * n ^9 ure 6. then 
update equation (8) connects v^ = v^ k j to the vector x^ , rotates v 0 y to the new position v new , and 
finally normalizes v new - 

The KSO procedure is exactly like SHCM in that it updates (some subset of the) prototypes sequentially after 

the examination of each data point. Figure 7 indicates the geometry of the scheme specified in <3b>; the 
basic idea is that once the prototype closest to the current data point is found, all prototypes in a 

neighborhood of the "winner" are also updated. 

Figure 6. The Geometry of Kohonen's Updating Rule 
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Figure 7. KSO Updating of Prototypes In the Neighborhood N*(k) of "Winner" v,, (k) 



Although the "feature web" shown in Figure 7 is conceptualized here as being in R, s , it has actually been 
displayed only the case s = 2. Kohonen has shown that this process converges, in the sense that the (v t j}-* 
{V|*} as {ctj }-»0, in the special case s=2. Moreover, the limiting {v^*} preserve a "topological ordering" property 

of the data set X on an array of output nodes associated to the weight vectors. Iteration in the KSO method 
thus trains the weight vectors {v^*} so that they preserve "order" in the output nodes. As previously noted, 

the KSO method does not use or generate a partition U of the data during training. However, once the weight 
vectors stabilize, the KSO model produces a hard U by following the nearest prototype rule below. 


More specifically, once a set of prototypes {Vj} are found by “training" on some data set X (this includes all four 

methods described above, HCM, FCM SHCM and KSO), they can be used to label any unlabeled data set. 
For any vector x e fc. s , the HCM equation for u jk defines a (piecewise linear) nearest prototype classifier: 


The Nearest Prototype Classifier Decision Rule : Given {Vj} Compute, non-iteratively, the hard c- 
partition of (any) data X with HCM equation (6a): 
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u 


ik* 


1; ((l|Xk-v i || I )) 2 = min j {(||x k -Vj|| I ) 2 } 
0; otherwise 


( 9 ) 


Note that we have written (9) with the Euclidean norm. Theorem 2 suggests that any scalar product induced 
A-norm might be used in the formula; however, interpretation of the subsequent decision rule as discussed 
above becomes very difficult. Thus, while it makes sense geometrically to consider variations in the norm as in 
(7) while searching for the cluster centers, it is much less clear that norms other than the Euclidean norm 
should be used during classification. Figure 8 is a rough depiction of how the KSO method might begin; 
Figure 9 shows the situation after termination of KSO, followed by a posteriori application of (9) to find an 
"optimal" hard c-partition U corresponding to the final weights . A question about how rule (9) is used with the 
KSO prototypes remains: how do we, without labeled data, assign one of c < c* "real" labels to subsets of the 
c* weight vectors found by the KSO scheme? The same question applies to FCM - we still need to decide 
which of the c "real" labels belongs to each prototype - the problem is just more pronounced when there are 
multiple prototypes for each class. 


Figure 8. Initial Configuration of Weight Vectors In the KSO Scheme 
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Figure 9. Terminal Weight Vectors and an HCM Partition In the KSO Scheme 



4. DISCUSSION AND CONCLUSIONS 

First, we itemize the major differences between ISODATA and KSO : 

(Di) FCM , HCM and SHCM are intrinsic clustering methods - i.e., one of their inputs is an unknown partition, 
and one of their outputs is a partition of unlabeled data set X which is optimal in the sense of minimizing a 
norm driven objective function. The KSO method, on the other hand, needs an a posteriori rule such as 
the nearest prototype rule at (9) to generate a partition of the data non-iteratively. We might call this an 
extrinsic clustering scheme. Moreover, without labeled data that can be used to discover which subsets 
of the c* multiple prototypes found by the KSO scheme should be identified with each of the c classes 
assumed in (9), there is no general way to even implement (9) with the KSO rule. Thus, much must be 
added to KSO to make it a true clustering method. 

(D2) The data set X is used differently. KSO uses the data sequentially (locally) and hence, its outputs are 
dependent on local geometry and order of labels, whereas ISODATA utilizes the data globally, and 
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updates both the weights and partition values in parallel at each pass. In this sense KSO is most akin to 
Sequential Hard c-Means, which is also sensitive to ordering of labels - this is often regarded as a fatal flaw 
in clustering. 

(D3) KSO can have multiple prototypes for each class; ISODATA has but one. In clustering, the usual 
assumption is that c is unknown, and one resorts to various cluster validity schemes to validate the results 
of any algorithm. Since the KSO scheme uses many prototypes, without assuming an underlying "true 
but unknown" number of clusters, this is advantageous to the user. However, the dilemma of how to 
convert the prototypes into clusters, as discussed in (D1), persists. 

(D4) KSO uses local orientation (cos 9 = <x,v>) on the unit ball as the measure of similarity between data and 
weights, whereas ISODATA uses cluster shape (via the eigenstructure of A) and location (via the lengths 
of the weights and the data) to assess (dis) similarity between the data and prototypes. Thus, the c-Means 
approach has a much more "statistical" flavor than KSO. On the other hand, KSO uses the dot product at 
each node, in the spirit of the McCulloch-Pitts neuron. Thus, local computations in the KSO scheme 
proceed on the basis assumed by many workers in neural network research, and make the KSO scheme 
more easily identifiable with this type of computational architecture. 

(D5) KSO preserves "order" in a certain sense; ISODATA does not. This property of the KSO method is 
perhaps its most interesting distinction. There is little hope that c-Means has a similar property. Since 
cognitive science assures us that one aspect of intelligence is its inherent ability to order, this aspect of 
the KSO approach again shows well in its favor. A significant line of research concerns whether or not the 
FCM/HCM models possess this, or any similar property. 

(D6) Weight updates in the KSO method are intuitively appealing; weight updates in ISODATA are 
mathematically necessary. Since the update formula in c-Means finds either real or generalized centroids, 
we might claim that this scheme is also intuitively appealing. In this regard the c-Means algorithms 
(including SHCM) have a clear theoretical advantage, at least in terms of justification of the procedure 
used. 


(D7) FCM, HCM and SHCM are all well-defined optimization problems; KSO is an heuristic procedure. An 
interesting question about KSO is this: what function is being optimized during iteration? An answer to 
this question would be both useful and illuminating. The criterion functions that drive FCM, HCM and 
SHCM are well understood geometrically and statistically; discovery of a criterion function for Kohonen's 
algorithm might supply a great deal of insight about other properties of the algorithm and its outputs. 
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(D8) KSO partitions have so far been generated with the nearest prototype rule and the Euclidean norm, 
whereas FCM, HCM and SHCM can be used with any inner product and two Minkowski norms. Much 
research can be done on the issue of how best to use the Kohonen prototypes to find cluster 

substructure. There are many natural ways besides the nearest prototype rule to use KSO outputs with 
the weights {Vj}. For example, one could simply distribute unit memberships satisfying (1b) across the 

KSO nodes at each step using distance proportions. This generalizes Kohonen's model from a 
"neighborhood take all" to a "neighborhood share all" concept. One certainly suspects that It is possible 
to incorporate U e M fcn as an unknown in the KSO approach , so that an extended KSO algorithm 

creates partitions of the data that are necessary, rather than, as in the current use of the HCM labeling 
rule, a heuristic afterthought. 


Major similarities between ISODATA and KSO include: 


(SI) If we let (U F , v F ), (U H , v H ), (U s . Vg), and (U K . v K ) denote, respectively, the pairs found by FCM, HCM, 
SHCM and KSO, we note that (U p , v p ) is a critical point for J m , while (U H , v H ). (U s . v s ), and (U K . v K ) 
are, because of the HCM theorem, (possibly different) critical points of J r However, (U H , v H ) # (U s , Vg) 
* (U K , v K ) generally. This suggests that (i) HCM (and especially SHCM) and KSO as described herein are 


most definitely related, and (ii), there should be a generalized (fuzzy) KSO that bears the same 
relationship to FCM that the hard c-Means versions bear to the current version of KSO. It seems clear 
that there is a stronger mathematical link between FCM/HCM and KSO than is currently known. 
Connection of the two approaches begins with careful formulation of a constrained optimization problem 
that holds for KSO. This involves finding a global KSO criterion function and necessary conditions that 
require the calculation of the weight vectors (Vj) as in KSO <3b>. 


(S2) Both algorithms find prototypes (weights or cluster centers) in the data that provide a compressed 
representation of it, and enable nearest prototype classifier design. Recent work by Huntsberger and 
Ajjimarangsee [13] indicates that FCM is at least as good as KSO in terms of minimizing apparent error 
rates. And further, FCM sometimes generates identical solutions to KSO on various well known data 
sets. This is another powerful indicator of the underlying (unknown) relationship between the KSO and 
c-Means methods. Much can be done empirically to confirm or deny specific relationships between the 

two methods. 

We have itemized some similarities and differences between two approaches to the clustering of unlabeled 
data - Hard/Fuzzy c-Means and Kohonen’s self-organizing feature maps (KSO), and posed some questions 
concerning each method. Successful resolution of these questions will benefit both models. Numerical 
convergence properties and the neural-like behavior of both the extended KSO and FCM algorithms should 


157 



be established. Issues to be studied should include ‘ robustness, adaptivity , parallelism , apparent error rates, 
time and space complexity, type and rate of convergence, optimality tests, and initialization sensitivity. 
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